---
title: Evaluate experiments (Experiments tab)
description: Describes how to filter the Leaderboard and use visualization tools to evaluate models in the DataRobot Workbench interface.

---

# Evaluate experiments (Experiments tab) {: #evaluate-experiments-experiments-tab }

Once you start modeling, Workbench begins to construct your model Leaderboard, a list of models ranked by performance, to help with quick model evaluation. The Leaderboard provides a summary of information, including scoring information, for each model built in an experiment. From the Leaderboard, you can click a model to access visualizations for further exploration. Using these tools can help to assess what to do in your next experiment.

There are two "flavors" of Leaderboard available.

![](images/wb-lb-experiment.png)

*  This page describes the **Experiment** tab, which helps to understand and evaluate models from a single experiment.
*  See also the [**Comparison**](wb-model-compare) tab page, which allows you to compare up to three models of the same type (for example, binary, regression) from any number of experiments within a single Use Case. Access the comparison tool from the tab or from the dropdown on the experiment name in the breadcrumbs.

After Workbench completes [Quick mode](model-data#modeling-modes-explained){ target=_blank } on the 64% sample size phase, the most accurate model is selected and trained on 100% of the data. That model is marked with the [**Prepared for Deployment**](model-rec-process#prepare-a-model-for-deployment){ target=_blank } badge.

![](images/wb-exp-eval-1.png)

Visualizations and information is provided in the **Model overview** part of the window. Click the [**View experiment info**](#view-experiment-info) to view experiment information or click on any model to access [insights](#model-insights).

## Manage the Leaderboard {: #manage-the-leaderboard }

There are several controls available, described in the next sections, for navigating the Leaderboard.

* [View experiment info](#view-experiment-info)
* [Filter models](#filter-models)
* [Sort models by](#sort-models-by)
* [Controls](#actions)

### View experiment info {: #view-experiment-info }

!!! info "Availability information"
    The **Data** and **Feature lists** tabs are Preview options that are on by default.

    <b>Feature flag:</b> Enable Data and Feature Lists tabs in Workbench

Click **View experiment info** to access tabs that:

* Provide summary of information about the experiment's [setup](#setup-tab).
* Display the [data](#data-tab) used to build models for the experiment.
* Show [Feature lists](#feature-lists-tab) built for the experiment and available for model training.
* Open the [blueprint repository](ml-experiment-add#blueprint-repository), which provides access to additional blueprints for training.

![](images/wb-exp-eval-2.png)


#### Setup tab {: #setup-tab }

The **Setup** tab reports the parameters used to build the models on this Leaderboard.

Field | Reports...
----- | -----------
Created | A time stamp indicating the creation of the experiment as well as the user who initiated the model run.
Dataset | The name, number of features, and number of rows in the modeling dataset. This is the same information available from the [data preview](wb-data-tab) page.
Target | The feature selected as the basis for predictions, the resulting project type, and the [optimization metric](opt-metric){ target=_blank } used to define how to score the experiment's models. You can [change the metric](#sort-models-by) the Leaderboard is sorted by, but the metric displayed in the summary is the one used for the build.
Partitioning | Details of the partitioning done for the experiment, either the default or [modified](ml-experiment-create#data-partitioning-tab){ target=_blank }.
Additional settings | Advanced settings that were configured from the [**Additional settings**](ml-experiment-create#configure-additional-settings){ target=_blank } tab.

#### Data tab {: #data-tab }

The **Data** tab provides [summary analytics](model-ref#data-summary-information) of the data used in the project. To [view exploratory data insights](wb-data-tab), click the **dataset preview** link.

![](images/wb-exp-eval-30.png)

By default, the display includes all features in the dataset. You can view analytics only for features specific to a feature list by toggling **Filter by feature list** and then selecting a list:

![](images/wb-exp-eval-32.png)

Click on the arrow or three dots next to a column name to change the sort order.


#### Feature lists tab {: #feature-lists-tab }

Click the **Feature lists** tab to view all feature lists associated with the experiment. The display shows both DataRobot's [automatically created](feature-lists#automatically-created-feature-lists) lists and any [custom](wb-data-tab#create-a-feature-list) feature lists that were created prior to model training.

![](images/wb-exp-eval-31.png)
<br>
The following actions are available for feature lists:

Action | Description
------ | -----------
View features | Explore insights for a feature list. This selection opens the **Data** tab with the filter set to the selected list.
Edit name and description | Provides a dialog to change the list name. You cannot change these values for a DataRobot-created list.
Download | Downloads the features contained in that list as a CSV file.
Rerun modeling | Opens the **Rerun modeling** modal to allow selecting a new feature list and restarting Autopilot.  


#### Blueprints repository tab {: #blueprints-repository-tab }

{% include 'includes/blueprint-repo.md' %}


### Filter models {: #filter-models }

Filtering makes viewing and focusing on relevant models easier. Click **Filter models** to set the criteria for the models that Workbench displays on the Leaderboard. The choices available for each filter are dependent on the experiment and/or model type&mdash;they were used in at least one Leaderboard model&mdash;and will potentially change as models are added to the experiment. For example:

Filter | Displays models that...
----- | -------------
Labeled models | Have been assigned the listed tag, either [starred models](leaderboard-ref#tag-and-filter-models){ target=_blank } or models [recommended for deployment](model-rec-process#prepare-a-model-for-deployment){ target=_blank }.
Feature list | Were built with the selected feature list.
Sample size (random or stratified partitioning)  | Were trained on the selected sample size.
Training period (date/time partitioning)   | Were trained on backtests defined by the selected duration mechanism.
Model family | Are part of the selected model family: <ul><li>GBM (Gradient Boosting Machine), such as Light Gradient Boosting on ElasticNet Predictions, eXtreme Gradient Boosted Trees Classifier </li><li>GLMNET (Lasso and ElasticNet regularized generalized linear models), such as Elastic-Net Classifier, Generalized Additive2</li><li>RI (Rule induction), such as RuleFit Classifier</li><li>RF (Random Forest), such as RandomForest Classifier or Regressor</li><li>NN (Neural Network), such as Keras</li></ul>

### Sort models by {: #sort-models-by }

By default, the Leaderboard sorts models based on the score of the validation partition, using the selected [optimization metric](opt-metric){ target=_blank }. You can, however, use the **Sort models by** control to change the basis of the display parameter when evaluating models.

Note that although Workbench built the project using the most appropriate metric for your data, it computes many applicable metrics on each of the models. After the build completes, you can redisplay the Leaderboard listing based on a different metric. It will not change any values within the models, it will simply reorder the model listing based on their performance on this alternate metric.

![](images/wb-exp-eval-4.png)

See the page on [optimization metrics](opt-metric){ target=_blank } for detailed information on each.

### Controls {: #controls }

Workbench provides simple, quick shorthand controls:

Icon | Action
---- | ------
![](images/icon-wb-rerun.png)| Reruns Quick mode with a different feature list. If you select a feature list that has already been run, Workbench will replace any deleted models or make no changes.
![](images/icon-wb-duplicate.png) | [Duplicates the experiment](manage-projects#duplicate-a-project), with an option to reuse just the dataset, or the dataset and settings.
![](images/icon-wb-delete.png) | Deletes the experiment and its models. If the experiment is being used by an application, you cannot delete it.
![](images/icon-wb-close.png) | Slides the Leaderboard panel closed to make additional room for, for example, viewing insights.


## Model insights  {: #model-insights }

Model insights help to interpret, explain, and validate what drives a model’s predictions. Available insights are dependent on experiment type, but may include the insights listed in the predictive modeling insights table below. Availability of [sliced insights](sliced-insights) is also model-dependent.


!!! info "Availability information"
	The following insights are Public Preview in Workbench.

    * Sliced insights are _off_ by default. Contact your DataRobot representative or administrator for information on enabling them. **Feature flag:** Slices in Workbench
    * Feature Effects is _off_ by default. Contact your DataRobot representative or administrator for information on enabling it. **Feature flag:** Slices in Workbench
    * SHAP Prediction Explanations are _on_ by default. **Feature flag:** SHAP in Workbench


Insight | Description | Problem type | Sliced insights?
------- | ----------- | ------------ | ----------------
[Blueprint](#blueprint) | Provides a graphical representation of data preprocessing and parameter settings. | All |
[Feature Effects](#feature-effects) | Conveys how changes to the value of each feature change model predictions | All |✔
[Feature Impact](#feature-impact) | Shows which features are driving model decisions. | All | ✔
[Lift Chart](#lift-chart) | Depicts how well a model segments the target population and how capable it is of predicting the target.| All |  ✔
[Residuals](#residuals) | Provides scatter plots and a histogram for understanding model predictive performance and validity. | Regression | ✔
[ROC Curve](#roc-curve) | Provides tools for exploring classification, performance, and statistics related to a model. | Classification | ✔
[SHAP Prediction Explanations](#shap-prediction-explanations) | Estimates how much each feature contributes to a given prediction, with values based on difference from the average. | Classification, regression |

To see a model's insights, click on the model in the left-pane Leaderboard. Note that different insights are available for [time-aware experiments](ts-experiment-evaluate#model-insights).

### Accuracy Over Time  {: #accuracy-over-time }

For time-aware projects, [Accuracy Over Time](aot){ target=_blank } helps to visualize how predictions change over time. By default, the view shows predicted and actual vs. time values for the training and validation data of the most recent (first) backtest. This is the backtest model DataRobot uses to deploy and make predictions. (In other words, the model used to generate the error metric for the validation set.)

![](images/wb-ts-eval-13.png)

The visualization also has a time-aware [Residuals](aot#interpret-the-residuals-chart){ target=_blank } tab that plots the difference between actual and predicted values. It helps to visualize whether there is an unexplained trend in your data that the model did not account for and how the model errors change over time.

![](images/wb-exp-eval-19.png)

### Blueprint {: #blueprint }

Blueprints are ML pipelines containing preprocessing steps (tasks), modeling algorithms, and post-processing steps that go into building a model. The [Blueprint](blueprints){ target=_blank } tab provides a graphical representation of the blueprint, showing each step. Click on any task in the blueprint to see more detail, including more complete model documentation (by clicking **DataRobot Model Docs** from inside the blueprint’s task).

![](images/wb-exp-eval-21.png)

Additionally, you can access the [blueprint repository](ml-experiment-add#blueprint-repository) from the **Blueprint** tab:

![](images/wb-exp-eval-22.png)

### Feature Effects {: #feature-effects }

The [Feature Effects](feature-effects){ target=_blank } insight shows the effect of changes in the value of each feature on model predictions&mdash;how does a model "understand" the relationship between each feature and the target? It is an on-demand feature, dependent on the [Feature Impact](feature-impact){ target=_blank } calculation, which is prompted for when first opening the visualization. The insight is communicated in terms of [partial dependence](feature-effects#partial-dependence-logic){ target=_blank }, an illustration of how changing a feature's value, while keeping all other features as they were, impacts a model's predictions.

![](images/wb-exp-eval-29.png)

### Feature Impact {: #feature-impact }

{% include 'includes/feature-impact-include.md' %}

![](images/wb-exp-eval-5.png)

### Lift Chart {: #lift-chart }

{% include 'includes/lift-chart-include.md' %}

![](images/wb-exp-eval-17.png)

### Period Accuracy {: #period-accuracy }

[Period Accuracy](ts-period-accuracy) gives you the ability to specify which are the more important periods within your training dataset, which DataRobot can then provide aggregate accuracy metrics for and surface those results on the Leaderboard.

Period Accuracy lets you define periods within your dataset and then compare their metric scores against the metric score of the model as a whole. In other words, you can specify which are the more important periods within your training dataset and DataRobot can then provide aggregate accuracy metrics for that period and surface those results on the Leaderboard. Periods are defined in a separate CSV file that identifies which rows to group based on the project’s data/time feature. Once uploaded, and with the insight calculated, DataRobot provides a table of period-based results and an “over time” histogram for each period.


![](images/period-accuracy-wb1.png)


### Residuals {: #residuals }

For regression experiments, the [Residuals](residuals){ target=_blank } tab helps to clearly understand a model's predictive performance and validity. It allows you to gauge how linearly your models scale relative to the actual values of the dataset used. It provides multiple scatter plots and a histogram to assist your residual analysis:

* Predicted vs. Actual
* Residual vs. Actual
* Residual vs. Predicted
* Residuals histogram

![](images/wb-exp-eval-7.png)

### ROC Curve {: #roc-curve }

For classification experiments, the ROC Curve tab provides the following tools for exploring classification, performance, and statistics related to a selected model at any point on the probability scale:

* An [ROC Curve](roc-curve){ target=_blank }
* [Cumulative charts](cumulative-charts){ target=_blank }
* A [confusion matrix](confusion-matrix){ target=_blank }
* A [payoff matrix/profit curve](profit-curve){ target=_blank }
* [Metrics](metrics){ target=_blank }

![](images/wb-exp-eval-6.png)

### SHAP Prediction Explanations {: #shap-prediction-explanations }

For non-time series projects, [SHAP Prediction Explanations](shap-pe) illustrate what drives predictions on a row-by-row basis. They provide a quantitative indicator of the effect variables have on the predictions, answering why a model made a certain prediction.

{% include 'includes/shap-pes-include.md' %}

 ![](images/wb-exp-eval-34.png)


### Stability {: #stability }

The [Stability](stability) tab provides an at-a-glance summary of how well a model performs on different backtests. It helps to measure performance and gives an indication of how long a model can be in production (how long it is "stable") before needing retraining. The values in the chart represent the validation scores for each backtest and the holdout.

![](images/wb-exp-eval-20.png)

{% include 'includes/wb-compliance-experiment.md' %}

## Manage experiments {: #manage-experiments }

At any point after models have been built, you can manage an individual experiment from within its Use Case. Click on the three dots to the right of the experiment name to delete it. To share the experiment, use the Use Case [**Manage members**](wb-build-usecase#share) tool to share the experiment and other associated assets.

![](images/wb-exp-15.png)

## What's next? {: #whats-next }

After selecting a model, you can, from within the experiment:

* [Add models to experiments](ml-experiment-add){ target=_blank }.
* [Make predictions](wb-predict){ target=_blank }.
* [Create No-Code AI Apps](wb-apps/index){ target=_blank }.
* [Generate a compliance report](#compliance-documentation).
